table 4
Clustering based on Stochastic Dominance with application for risk averters and risk seekers
Li, Hua, Jia, Xue, Kang, Yilin, Wong, Wing-Keung
Stock clustering algorithms play a pivotal role in quantitative finance and the asset management industry, serving as a core mechanism for understanding market complexity and conducting asset preselection. Their intrinsic value lies in enabling investors to identify the true underlying structure of the stock market, thereby categorizing stocks with similar return characteristics or risk profiles into distinct groups. This data-driven market segmentation not only significantly reduces the computational dimensionality involved in portfolio construction but also provides a solid foundation for formulating differentiated investment strategies. A review of existing literature reveals that scholars both domestic and international have achieved fruitful results in stock clustering. Traditional clustering research predominantly employs classic machine learning algorithms: Xiaojun (2019) and Wu et al. (2022) utilized the K-means algorithm for stock partitioning; Huang et al. (2010) and Lu et al. (2020) explored the sectoral structures of the SSE 50 Index and other markets based on Agglomerative Hierarchical Clustering (AHC) and Spectral Clustering; Korzeniewski (2018) further introduced the Partitioning Around Medoids (PAM) algorithm to construct portfolios with enhanced risk resistance. In recent years, with the advancement of deep learning, L ucio and Caiado (2022) and Siregar and Yosia (2024) have attempted to incorporate time-series models (such as TGARCH) or specific market features (e.g., Indonesian stock data) into clustering frameworks. However, despite their respective merits in capturing market trends, these methods share a common limitation: traditional stock clustering approaches predominantly rely exclusively on stock-specific information (e.g., price, volatility, or financial metrics), neglecting the heterogeneity of market participants--namely, the "investors". In reality, investors are typically categorized into three distinct types based on their risk preferences: risk-averse, risk-seeking, and risk-neutral. Divergent risk attitudes inevitably lead to fundamentally different asset selection logic.
481fbfa59da2581098e841b7afc122f1-Supplemental.pdf
The code for our experiments is available at https://github.com/AndyShih12/HyperSPN. To examine the merits of HyperSPNs as discussed in Section 3, we construct a hand-crafted dataset to test the three types of models described in Figure 4: SPN-Large, SPN-Small, and HyperSPN. The hand-crafted dataset is procedurally generated with 256 binary variables and 10000 instances, broken into train/valid/test splits at 70/10/20%. The generation procedure is designed such that the correlation between variable i and j is dependent on the path length between leaves i and j of a complete binary tree over the 256 variables. The exact details can be found in our code.
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment
Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of natural language tasks. However, a key limitation is that these language models fundamentally lack grounding to visual perception - a crucial attribute needed to extend to real world tasks such as in visual-question answering and robotics. While prior works have largely connected image to text through pretraining or fine-tuning, learning such alignments are generally costly due to a combination of curating massive datasets and large computational burdens. In order to resolve these limitations, we propose a simple yet effective approach called Language-Quantized AutoEncoder (LQAE), a modification of VQ-VAE that learns to align text-image data in an unsupervised manner by leveraging pretrained language model denoisers (e.g.BERT). Our main idea is to encode images as sequences of text tokens by directly quantizing image embeddings using a pretrained language codebook. We then feed a masked version of the quantized embeddings into a BERT to reconstruct the original input. By doing so, LQAE learns to represent similar images with similar clusters of text tokens, thereby aligning these two modalities without the use of aligned text-image pairs. We show LQAE learns text-aligned image tokens that enable few-shot multi-modal learning with large language models, outperforming baseline methods in tasks such as image classification and VQA while requiring as few as 1-10 image-text pairs1.
Supplementary Material for Enhancing Robotic Program Synthesis Through Environmental Context Anonymous Author(s) Affiliation Address email
The hardware employed4 consisted of 24 Intel(R) Xeon(R) Gold 5317 CPUs @ 3.00GHz, 8 modules of 32GB memory (with a5 speed of 3200MT/s), and 2 NVIDIAA40 GPUs with 48GB of memory each (NVIDIAUNIX x86_646 Kernel Module 510.108.03, CUDA version 11.6, cuDNN version 8.3).7 A.2 Network Architecture8 For the program synthesizing stage, the structure of the I/O encoder is elaborated in Table 1, where9 we employ dk1 dk2-s-do Conv to denote the 2D convolution with kernel size dk1 dk2, stride s, and10 output channel do. Additionally, BN refers to batch normalization [8], and di-do Linear denotes the11 fully-connected layer with input feature di and output feature do. The I/O encoder utilizes residual12 networks [7] and takes I/O pair with size 5 5 3 as inputs. To improve candidate programs through environmental contexts, the decoder's structure is elaborated14 in Table 2. Here, we utilize do-hGATv2Conv to represent the dynamic graph attention variant [1]15 with output channel do and multiple attention heads h, and do-nl denotes the nl layered bi-directional16 LSTM with output feature do.
A Hand-Crafted Example
The code for our experiments is available at https://github.com/AndyShih12/HyperSPN. To examine the merits of HyperSPNs as discussed in Section 3, we construct a hand-crafted dataset to test the three types of models described in Figure 4: SPN-Large, SPN-Small, and HyperSPN. The hand-crafted dataset is procedurally generated with 256 binary variables and 10000 instances, broken into train/valid/test splits at 70/10/20%. The generation procedure is designed such that the correlation between variable i and j is dependent on the path length between leaves i and j of a complete binary tree over the 256 variables. The exact details can be found in our code.
d045c59a90d7587d8d671b5f5aec4e7c-AuthorFeedback.pdf
We thank all reviewers for their constructive comments and address the raised issues below. As described in Secion 3.2 of the manuscript, we introduce the The source code, as mentioned on L141, will be made available to the public. R1: Why the adaptive flow filtering is a better way of reducing artifacts? Our method could be seen as a learnable median filter in spirit. Although the quantitative improvement from the adaptive flow filtering (ada.) is small, this component is important in generating results with higher visual quality SepConv has originally been trained on high-quality videos with large motion.